Model compression as constrained optimization, with application to neural nets. Part II: quantization

نویسندگان

Miguel Á. Carreira-Perpiñán

Yerlan Idelbayev

چکیده

We consider the problem of deep neural net compression by quantization: given a large, reference net, we want to quantize its real-valued weights using a codebook with K entries so that the training loss of the quantized net is minimal. The codebook can be optimally learned jointly with the net, or fixed, as for binarization or ternarization approaches. Previous work has quantized the weights of the reference net, or incorporated rounding operations in the backpropagation algorithm, but this has no guarantee of converging to a loss-optimal, quantized net. We describe a new approach based on the recently proposed framework of model compression as constrained optimization (Carreira-Perpiñán, 2017). This results in a simple iterative “learning-compression” algorithm, which alternates a step that learns a net of continuous weights with a step that quantizes (or binarizes/ternarizes) the weights, and is guaranteed to converge to local optimum of the loss for quantized nets. We develop algorithms for an adaptive codebook or a (partially) fixed codebook. The latter includes binarization, ternarization, powers-of-two and other important particular cases. We show experimentally that we can achieve much higher compression rates than previous quantization work (even using just 1 bit per weight) with negligible loss degradation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Model compression as constrained optimization, with application to neural nets. Part I: general framework

Compressing neural nets is an active research problem, given the large size of state-of-the-art nets for tasks such as object recognition, and the computational limits imposed by mobile devices. We give a general formulation of model compression as constrained optimization. This includes many types of compression: quantization, low-rank decomposition, pruning, lossless compression and others. T...

متن کامل

Model compression as constrained optimization, with application to neural nets

Compressing neural nets is an active research problem, given the large size of state-ofthe-art nets for tasks such as object recognition, and the computational limits imposed by mobile devices. Firstly, we give a general formulation of model compression as constrained optimization. This makes the problem of model compression well defined and amenable to the use of modern numerical optimization ...

متن کامل

Optimal Neural Net Compression via Constrained Optimization

Compressing neural nets is an active research problem, given the large size of state-of-the-art nets for tasks such as object recognition, and the computational limits imposed by mobile devices. Firstly, we give a general formulation of model compression as constrained optimization. This makes the problem of model compression well defined and amenable to the use of modern numerical optimization...

متن کامل

APPLICATION NEURAL NETWORK TO SOLVE ORDINARY DIFFERENTIAL EQUATIONS

In this paper, we introduce a hybrid approach based on neural network and optimization teqnique to solve ordinary differential equation. In proposed model we use heyperbolic secont transformation function in hiden layer of neural network part and bfgs teqnique in optimization part. In comparison with existing similar neural networks proposed model provides solutions with high accuracy. Numerica...

متن کامل

Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM

Although deep learning models are highly effective for various tasks, such as detection and classification, the high computational cost prohibits the deployment in scenarios where either memory or computational resources are limited. In this paper, we focus on model compression and acceleration of deep models. We model a low bit quantized neural network as a constrained optimization problem. Th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1707.04319 شماره

صفحات -

تاریخ انتشار 2017

Model compression as constrained optimization, with application to neural nets. Part II: quantization

نویسندگان

چکیده

منابع مشابه

Model compression as constrained optimization, with application to neural nets. Part I: general framework

Model compression as constrained optimization, with application to neural nets

Optimal Neural Net Compression via Constrained Optimization

APPLICATION NEURAL NETWORK TO SOLVE ORDINARY DIFFERENTIAL EQUATIONS

Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM

عنوان ژورنال:

اشتراک گذاری